Hybridization Based Machine Translations for Low-Resource Language with Language Divergence

نویسندگان

چکیده

A hybridised form of direct and rule-based language processing is used in this paper to present a Machine translation system from Sanskrit Hindi. The divergence between Hindi also discussed paper, along with proposition for how handle it. Sanskrit-Hindi bilingual dictionaries, Grammatical corpus analyses rule base, have all been the projected system. system's ability access data various vocabularies bases utilised expansion has improved by usage Elasticsearch technique. Additionally, novel technique that builds parse tree parsing table presented paper. processes input sentence using approach Context Free Grammar normal processing. No standard corpora available Translation which designed developed proposed work. specific produced dictionaries. achieved Bilingual Evaluation Understudy (BLEU) score 51.6 percent after being tested Python's natural toolkit API. performs better than current systems when compared cutting-edge systems, according comparison.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sublexical Translations for Low-Resource Language

Machine Translation (MT) for low-resource language has low-coverage issues due to Out-OfVocabulary (OOV) Words. In this research we propose a method using sublexical translation to achieve wide-coverage in Example-Based Machine Translation (EBMT) for English to Bangla language. For sublexical translation we divide the OOV words into sublexical units for getting translation candidates. Previous ...

متن کامل

Xhosa-English Machine Translation: Working with a Low-Resource Language

This report details the author’s experiences as a Distributed Research Experience for Undergraduates (DREU) summer research intern at Carnegie Mellon University’s Language Technologies Institute. Under the guidance of Prof. Carolyn Rosé, the author attempted to implement a phrase-based translation (i.e., statistical machine translation, or SMT) system for translating Xhosa text into English usi...

متن کامل

Machine Translation, Language Divergence and Lexical Resources

The key concern in machine translation, whose purpose it is to convert documents from one language to another, is the language divergence problem. This problem arises from the fact that languages make different lexical and syntactic choices for expressing an idea. Language divergence needs to be tackled not only for translating between language pairs from distant families (e.g, English and Japa...

متن کامل

Example-Based Machine Translation for Low-Resource Language Using Chunk-String Templates

Example-Based Machine Translation (EBMT) for low resource language, like Bengali, has low-coverage issues, due to the lack of parallel corpus. In this paper, we propose an EBMT for low resource language, using chunk-string templates (CSTs) and translating unknown words. CSTs consist of a chunk in source-language, a string in target-language, and word alignment information. CSTs are prepared aut...

متن کامل

Selection Criteria for Low Resource Language Programs

This paper documents and describes the criteria used to select languages for study within programs that include low resource languages whether given that label or another similar one. It focuses on five US common task, Human Language Technology research and development programs in which the authors have provided information or consulting related to the choice of language. The paper does not des...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2022

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3571742